Reading and Processing .RVG binary files.¶
Hello and welcome to my attempt to read some “simple” binary files.
The goal here:¶
I need to perform some pre-processing on a set of binary .rvg files (containing trajectory output from a reduced dynamics run of GEODYN) and stitch them together to construct a GEODYN-specific, trajectory-based, tracking data type (called PCE).
Each binary (.rvg) file contains a 54-hour arc of ICESat2 trajectory data. These datasets are the output from a very precise run of GEODYN in which the orbit has been determined very well (to a few centimeters accuracy) using a reduced dynamics technique (empirical accelerations and other parameters are adjusted to account for any mismodelled forces).
Methodology for pre-processing the binary files:¶
dump the data from each arc into some usable format
chop of the 3 hour padding on the ends to eliminate discontinuities from end effects
stitch together all the files (i provide some for this notebook as examples)
smooth over any existing disconiuties between arc gaps or maneuvers.
Put all data into a single
TRAJ.txtfile to be converted to a GEODYN-specific tracking datatype.
Info on the binary files:¶
(from the GEODYN folks at Goddard) 1. These files are in what is called the RVG format. The RVG files are pretty simple to unpack (lol) 2. Each record has 29 words 3. Each word is a 64 bit floating point number 4. The first record is a header record with information about the file.
```
#| Header Record Format:
#| ---------------------
#|
#| WORD | Type | Description
#| ---- ---- -----------
#| 1 DP Coord. Sys. Flag
#| 0 = TOD
#| 1 = TOR
#| 2 = J2000
#| 2 DP Traj start date MJDSEC GPS
#| 3 DP Traj start frac sec
#| 4 DP Traj start date (YYMMDDHHMMSS) UTC
#| 5 DP Traj stop date MJDSEC GPS
#| 6 DP Traj stop frac sec
#| 7 DP Traj stop date (YYMMDDHHMMSS) UTC
#| 8 DP Traj interval sec
#| 9 DP GEODYN 2s version no.
#| 10 DP GEODYN 2s run date
#| 11 DP GEODYN 2s run time
#| 12 DP GEODYN 2e version no.w
#| 13 DP GEODYN 2e run date
#| 14 DP GEODYN 2e run time
#| 15 DP Speed of light
#| 16 DP GM for Earth
#| 17 DP Semi-major axis of Earth ref. ellipsoid
#| 18 DP Equatorial Flattening of Earth ref. ellipsoid
#| 19 DP Gravitational Potential Checksum
#| 20 DP Maximum Degree of Gravitational Expansion
#| 21 DP Maximum Order Of Gravitational Expansion
#| 22-29 DP spares
```
The last record is a sentinal record to tell you that you have reached the end of the file.
#| Sentinel Record Format: #| --------------------- #| #| WORD | Type | Description #| ---- ---- ----------- #| 1 DP 999.0 #| 2 DP Satellite ID #| 3 DP GEODYN IIS Versions #| 4 DP GEODYN IIE Versions #| 5-29 DP 0.0
The first word of that record has the value 999.0. In other words, when you encounter a record whose first word has the value 999.0, you know you have reached the end of the file.
All the records in the file except the first and last records, are data records.
#| Data Record Format:
#| ---------------------
#|
#| WORD | Type | Description
#| ---- ---- -----------
#| 1 DP MJDSEC (secs) % time is in GPS
#| 2 DP RSEC (fractional secs)
#| 3 DP UTC - GPS offset (secs)
#| 4 DP spare_4
#| 5 DP X Inertial sat. S.Vec (m)
#| 6 DP Y Inertial sat. S.Vec (m)
#| 7 DP Z Inertial sat. S.Vec (m)
#| 8 DP X_dot Inertial sat. S.Vec (m/sec)
#| 9 DP Y_dot Inertial sat. S.Vec (m/sec)
#| 10 DP Z_dot Inertial sat. S.Vec (m/sec)
#| 11 DP Satellite latitude (degrees)
#| 12 DP Satellite longitude (degrees)
#| 13 DP Satellite height (m)
#| 14 DP X-component ECF position (m)
#| 15 DP Y-component ECF position (m)
#| 16 DP Z-component ECF position (m)
#| 17 DP X_dot-component ECF velocity (m/sec)
#| 18 DP Y_dot-component ECF velocity (m/sec)
#| 19 DP Z_dot-component ECF velocity (m/sec)
#| 20 DP X component of polar motion (milliarcsec)
#| 21 DP Y component of polar motion (milliarcsec)
#| 22 DP beta angle (degrees)
#| 23 DP yaw angle (degrees)
#| 24 DP orbit angle (degrees)
#| 25 DP Quaternion QI for J2000 to ITRF (ECF)
#| 26 DP Quaternion 02 for J2000 to ITRF (ECF)
#| 27 DP Quaternion 03 for J2000 to ITRF (ECF)
#| 28 DP Quaternion 04 for J2000 to ITRF (ECF)
#| 29 DP Greenwich HR angle
1) Dump the data into usable format with scipy.io.FortranFile¶
[1]:
import numpy as np
import pandas as pd
%load_ext autoreload
%autoreload 2
from read_rvg_binary import read_rvg_binary
[2]:
path_to_binaryrvg = '/data/data_geodyn/inputs/icesat2/pre_processing/traj_files_rvg/'
rvg_data1 = read_rvg_binary(29, path_to_binaryrvg + 'orbit.1807001.2018.287')
rvg_data2 = read_rvg_binary(29, path_to_binaryrvg + 'orbit.1807001.2018.288')
There are ~ 109434 records in this file.
End of record
There are ~ 109434 records in this file.
End of record
1.a Look at the data for a few files to see the discontinuities.¶
Convert MJDS to YYMMDDHHMMSS
[4]:
%load_ext autoreload
%autoreload 2
from ModJulianDaySeconds__to__YYMMDDHHMMSS import MJDS_to_YYMMDDHHMMSS
#### TEST to see if the time conversion works
# YYMMDDHHMMSS = MJDS_to_YYMMDDHHMMSS(start_date_MJDSEC_GPS)
# print(pd.to_datetime( YYMMDDHHMMSS, format='%y%m%d-%H%M%S'))
# print(pd.to_datetime( str(int(start_date_YYMMDDHHMMSS_UTC)), format='1%y%m%d%H%M%S'))
def RVG_Files_add_datetime_column(rvg_file):
# data
mjdsecs = rvg_file['data']['MJDSEC_secs_timeGPS']
#convert the MJDS to a usable string.
yymmdd_str = [MJDS_to_YYMMDDHHMMSS(x) for x in mjdsecs]
# convert string of dates to datetime for plotting
dates = [pd.to_datetime( x, format='%y%m%d-%H%M%S') for x in yymmdd_str]
return(dates)
dates1 = RVG_Files_add_datetime_column(rvg_data1)
dates2 = RVG_Files_add_datetime_column(rvg_data2)
#add the datetime string the the Dataframe within the dictionary
rvg_data1['data'].insert(0, 'Date', dates1)
rvg_data2['data'].insert(0, 'Date', dates2)
The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-9449bc21af09> in <module>
28
29 #add the datetime string the the Dataframe within the dictionary
---> 30 rvg_data1['data'].insert(0, 'Date', dates1)
31 rvg_data2['data'].insert(0, 'Date', dates2)
32
/data/miniconda3/envs/pygeodyn/lib/python3.8/site-packages/pandas/core/frame.py in insert(self, loc, column, value, allow_duplicates)
3758 self._ensure_valid_index(value)
3759 value = self._sanitize_column(column, value, broadcast=False)
-> 3760 self._mgr.insert(loc, column, value, allow_duplicates=allow_duplicates)
3761
3762 def assign(self, **kwargs) -> DataFrame:
/data/miniconda3/envs/pygeodyn/lib/python3.8/site-packages/pandas/core/internals/managers.py in insert(self, loc, item, value, allow_duplicates)
1189 if not allow_duplicates and item in self.items:
1190 # Should this be a different kind of error??
-> 1191 raise ValueError(f"cannot insert {item}, already exists")
1192
1193 if not isinstance(loc, int):
ValueError: cannot insert Date, already exists
We can see below that each file is 29.40027 hours long
[5]:
starttime = rvg_data1['data']['Date'].iloc[0]
endtime = rvg_data1['data']['Date'].iloc[-1]
print('Start: ',starttime)
print('end : ',endtime)
time_diff_days = pd.Timedelta(endtime - starttime).days * 24
time_diff_hours = pd.Timedelta(endtime - starttime).seconds / 3600.0
time_diff_totalhours = time_diff_days + time_diff_hours
print('Each file contains' ,time_diff_totalhours,'hours worth of data')
Start: 2018-10-13 21:18:17
end : 2018-10-15 02:42:18
Each file contains 29.400277777777777 hours worth of data
Find the overlap between two files:
[6]:
from collections import namedtuple
Range = namedtuple('Range', ['start', 'end'])
def time_overlap(file1, file2, verbose=False):
data1 = file1['data']['Date']
data2 = file2['data']['Date']
r1 = Range(start=data1.iloc[0], end=data1.iloc[-1])
r2 = Range(start=data2.iloc[0], end=data2.iloc[-1])
latest_start = max(r1.start, r2.start)
earliest_end = min(r1.end, r2.end)
delta = (earliest_end - latest_start).total_seconds()/3600
overlap = (delta)
if verbose:
print('Latest_start',latest_start)
print('Earliest_end',earliest_end)
print('Overlap of ',overlap, 'hours')
return latest_start, earliest_end, overlap
(start_overlap, end_overlap, tot_overlap) = time_overlap(rvg_data1, rvg_data2, verbose=True)
Latest_start 2018-10-14 21:18:17
Earliest_end 2018-10-15 02:42:18
Overlap of 5.400277777777778 hours
[7]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
fig = make_subplots(rows=1, cols=1,
)
string = 'X_statevector_m'
fig.add_trace(go.Scattergl(x=rvg_data1['data']['Date'].iloc[-20000:],
y=rvg_data1['data'][string].iloc[-20000:]*1e2,
name= "set 1",
mode='markers',
marker=dict(
size=6,),
),
row=1, col=1,
)
fig.add_trace(go.Scattergl(x=rvg_data2['data']['Date'].iloc[:20000],
y=rvg_data2['data'][string].iloc[:20000]*1e2,
name= "set 2",
mode='markers',
marker=dict(
size=3,),
),
row=1, col=1,
)
fig.update_layout(
title="rvg files 1 and 2 overlap ",
yaxis_title="state vector C [cm]",
xaxis_title="Date",
)
fig.update_layout(legend= {'itemsizing': 'constant'})
fig.update_xaxes(tickangle=30)
fig.update_xaxes( exponentformat= 'power')
fig.update_yaxes( exponentformat= 'power')
fig.show()
We want to cut off the same amount of time from each file such that they line up perfectly
[8]:
print('Cut same amount off each end of file. So, ',tot_overlap/2, 'hours from each end of file')
Cut same amount off each end of file. So, 2.700138888888889 hours from each end of file
[9]:
rvg_data1['data']['Date']
[9]:
0 2018-10-13 21:18:17
1 2018-10-13 21:18:19
2 2018-10-13 21:18:20
3 2018-10-13 21:18:20
4 2018-10-13 21:18:21
...
105836 2018-10-15 02:42:13
105837 2018-10-15 02:42:15
105838 2018-10-15 02:42:15
105839 2018-10-15 02:42:16
105840 2018-10-15 02:42:18
Name: Date, Length: 105841, dtype: datetime64[ns]
[10]:
print('Last time', rvg_data1['data']['Date'].iloc[-1] )
rvg_data1['data']['Date'].iloc[-1] - pd.Timedelta(tot_overlap/2, unit='hours')
Last time 2018-10-15 02:42:18
[10]:
Timestamp('2018-10-15 00:00:17.499999600')
[11]:
def RVGfiles_timeoverlap_GetChoppingTime(file1, file2):
(_, _, tot_overlap) = time_overlap(file1, file2)
file1_start = file1['data']['Date'].iloc[0]
file1_end = file1['data']['Date'].iloc[-1]
file2_start = file2['data']['Date'].iloc[0]
file2_end = file2['data']['Date'].iloc[-1]
file1_new_start = file1_start + pd.Timedelta(tot_overlap/2, unit='hours')
file1_new_end = file1_end - pd.Timedelta(tot_overlap/2, unit='hours')
file2_new_start = file2_start + pd.Timedelta(tot_overlap/2, unit='hours')
file2_new_end = file2_end - pd.Timedelta(tot_overlap/2, unit='hours')
# print('file1_last: ',file1_last)
# print('file2_start: ',file2_start)
# print('file1_new_last: ',file1_new_last)
# print('file2_new_start: ',file2_new_start)
return(file1_new_start, file1_new_end, file2_new_start, file2_new_end)
[12]:
file1_new_start, file1_new_end, file2_new_start, file2_new_end = RVGfiles_timeoverlap_GetChoppingTime(rvg_data1, rvg_data2)
print('file1_new_start', file1_new_start)
print('file1_new_end', file1_new_end)
print('file2_new_start', file2_new_start)
print('file2_new_end', file2_new_end)
file1_new_start 2018-10-14 00:00:17.500000400
file1_new_end 2018-10-15 00:00:17.499999600
file2_new_start 2018-10-15 00:00:17.500000400
file2_new_end 2018-10-16 00:00:17.499999600
[13]:
def RVGfiles_chop_the_ends(file1, file2):
'''
Chop the ends off the file.
'''
file1_new_start, file1_new_end, file2_new_start, file2_new_end = RVGfiles_timeoverlap_GetChoppingTime(rvg_data1, rvg_data2)
df1 = file1['data']
df2 = file2['data']
##### Chop the FRONT off of the FIRST file
# select all the values greater than our new start and grab the last one
val1_front = df1.Date[df1.Date < file1_new_start].iloc[-1]
indx1_front = df1.Date[df1.Date==val1_front].index.unique()[0]
##### Chop the END off of the FIRST file
# select all the values less than our new start and grab the first one
val1_end = df1.Date[df1.Date > file1_new_end].iloc[1]
indx1_end = df1.Date[df1.Date==val1_end].index.unique()[0]
##### Chop the FRONT off of the SECOND file
# select all the values greater than our new start and grab the last one
val2_front = df2.Date[df2.Date < file2_new_start].iloc[-1]
indx2_front = df2.Date[df2.Date==val2_front].index.unique()[0]
##### Chop the END off of the SECOND file
# select all the values less than our new start and grab the first one
val2_end = df2.Date[df2.Date > file2_new_end].iloc[1]
indx2_end = df2.Date[df2.Date==val2_end].index.unique()[0]
# print('indx1_front',indx1_front)
# print('indx1_end',indx1_end)
# print('indx2_front',indx2_front)
# print('indx2_end',indx2_end)
df1_new = df1[:indx1_end][indx1_front+1:] # add one index so there is no overlap in time
df2_new = df2[:indx2_end][indx2_front+1:] # add one index so there is no overlap in time
return(df1_new, df2_new)
[14]:
(df1_new, df2_new) = RVGfiles_chop_the_ends(rvg_data1, rvg_data2)
Plot again to see how we did
[15]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
fig = make_subplots(rows=1, cols=1,
)
string = 'X_statevector_m'
fig.add_trace(go.Scattergl(x=df1_new['Date'],
y=df1_new[string]*1e2,
name= "adjusted set 1",
mode='markers',
marker=dict(
size=6,),
),
row=1, col=1,
)
fig.add_trace(go.Scattergl(x=df2_new['Date'],
y=df2_new[string]*1e2,
name= "adjusted set 2",
mode='markers',
marker=dict(
size=3,),
),
row=1, col=1,
)
fig.update_layout(
title="Adjusted Overlap on files 1 and 2 ",
yaxis_title="state vector X [cm]",
xaxis_title="Date",
)
fig.update_layout(legend= {'itemsizing': 'constant'})
fig.update_xaxes(tickangle=30)
fig.update_xaxes( exponentformat= 'power')
fig.update_yaxes( exponentformat= 'power')
fig.show()
Construct G2B files¶
Determine files that will go into each arc (files between maneuvers)
SATPAR 139 1807001 9.53000000 1514.000
EPOCH 181013210000.0000000181013210000.00000001810160300 00.000
The next steps in constucting the g2b_PCE geodyn input is to construct an ascii text file and run it through the ``pce_converter.f``
g2b binary file: 1. Load several (sample) days of RVG files 2. Process and combine them into an ascii file titled TRAJ.txt 3. Feed TRAJ.txt into the PCE_converter.f. The fortran code expects the following: 1. A file titled TRAJ.txt - Each record (line) of the TRAJ.txt file has one integer and four floating point words. - The integer and the first floating point word form the time tag of the record. - The integer of TRAJ.txt is just the first word
of the RVG data record converted to an integer. - The first floating point of TRAJ.txt is just the sum of words 2 and 3 of the RVG data record. - The last three words are the X, Y and Z coordinates of the satellite in the J2000 coordinate system.All auxilliary files will continue to be hosted in the path, /data/data_geodyn/inputs/icesat2/pre_processing/
[15]:
###### Notes:
# SATID = 1807001
# epoch_start = '181013210000'
## YYMMDDHHMMSS.mls0000cd
# epoch_end = '181016030000'
## YYMMDDHHMM SS.mls
# 54 hours (2 days 6 hours)
# Oct 13- 2100 DOY=287
# Oct 14- DOY=288
# Oct 15- DOY=289
# Oct 16- 0300 DOY=290
[16]:
arc1_files = ['orbit.1807001.2018.287' ,
'orbit.1807001.2018.288' ,
'orbit.1807001.2018.289A',
'orbit.1807001.2018.290A',
'orbit.1807001.2018.291',
# 'orbit.1807001.2018.292',
# 'orbit.1807001.2018.293',
# 'orbit.1807001.2018.294',
]
We will do this in a class using ``pygeodyn`` so that much of the work to call the code is behind-the-scenes
Construct ``TRAJ.txt`` as an ascii file
[17]:
# %load_ext autoreload
# %autoreload 2
# import sys
# sys.path.insert(0,'/data/geodyn_proj/pygeodyn/utils_pygeodyn_develop/')
# from pre_processing import pygeodyn_PreProcessing
# path_to_preprocessing = '/data/data_geodyn/inputs/icesat2/pre_processing'
# path_to_binaryrvg = '/data/data_geodyn/inputs/icesat2/pre_processing/traj_files_rvg'
# Obj = pygeodyn_PreProcessing(path_to_binaryrvg, path_to_preprocessing, arc1_files)
# # Obj.get_timechopped_rvgdata()
# Obj.make_ascii_traj_txt_file_for_pcefortran()
Call the fortran code with F2PY
[ ]:
[36]:
# import subprocess
# import os
# path_to_data = '/data/data_geodyn/inputs/icesat2/pre_processing/'
# path_to_pygeodyn = '/data/geodyn_proj/pygeodyn/utils_pygeodyn_develop/util_preprocessing/'
# ### CHANGE DIRECTORY to where the fortran code is hosted
# os.chdir(path_to_pygeodyn)
# #### Compile the pce code:
# command_1 = './compile_pce_f.sh'
# subprocess.run(command_1, shell = True)
[37]:
# command_2 = './ExecutePCE.exe > out_pce 2> err_execute'
# subprocess.run(command_2, shell = True)
[40]:
%load_ext autoreload
%autoreload 2
import sys
sys.path.insert(0,'/data/geodyn_proj/pygeodyn/utils_pygeodyn_develop/util_preprocessing/')
from pre_processing import pygeodyn_PreProcessing
path_to_preprocessing = '/data/data_geodyn/inputs/icesat2/pre_processing'
path_to_binaryrvg = '/data/data_geodyn/inputs/icesat2/pre_processing/traj_files_rvg'
Obj = pygeodyn_PreProcessing(path_to_binaryrvg, path_to_preprocessing, arc1_files)
Obj.call_fortran_pce_converter()
The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
pce_fortran.f compiled
pce_fortran.f executed
The G2B binary file has been saved to:/data/data_geodyn/inputs/icesat2/pre_processing/g2b_pce
[21]:
# !f2py -c -m pce_converter pce_converter.f
# !f2py -h --overwrite-signature fortran_pce.pyf pce_converter.f
[ ]:
[ ]:
[ ]:
[ ]:
[ ]:
[ ]:
[ ]:
[ ]: